Greedy mixture learning for multiple motif discovery in biological sequences
نویسندگان
چکیده
MOTIVATION This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing a greedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical partitioning scheme based on kd-trees is presented for partitioning the input dataset in order to speed-up the global searching procedure. The proposed method compares favorably over the well-known MEME approach and treats successfully several drawbacks of MEME. RESULTS Experimental results indicate that the algorithm is advantageous in identifying larger groups of motifs characteristic of biological families with significant conservation. In addition, it offers better diagnostic capabilities by building more powerful statistical motif-models with improved classification accuracy.
منابع مشابه
A sequential method for discovering probabilistic motifs in proteins.
OBJECTIVES This paper proposes a greedy algorithm for learning a mixture of motifs model through likelihood maximization, in order to discover common substrings, known as motifs, from a given collection of related biosequences. METHODS The approach sequentially adds a new motif component to a mixture model by performing a combined scheme of global and local search for appropriately initializi...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملdMotifGreedy: a novel tool for de novo discovery of DNA motifs with enhanced power of reporting distinct motifs
De novo discovery of over-represented DNA motifs is one of the major challenges in computational biology. Although numerous tools have been available for de novo motif discovery, many of these tools are subject to local optima phenomena, which may hinder detection of multiple distinct motifs. A greedy algorithm based tool named dMotifGreedy was developed. dMotifGreedy begins by searching for ca...
متن کاملG-SteX: Greedy Stem Extension for Free-Length Constrained Motif Discovery
Most available motif discovery algorithms in real-valued time series find approximately recurring patterns of a known length without any prior information about their locations or shapes. In this paper, a new motif discovery algorithm is proposed that has the advantage of requiring no upper limit on the motif length. The proposed algorithm can discover multiple motifs of multiple lengths at onc...
متن کاملRelation between weight matrix and substitution matrix: motif search by similarity
MOTIVATION The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the resu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 5 شماره
صفحات -
تاریخ انتشار 2003